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1 Document re-ranking based on automatically acquired key terms in Chinese 
information retrieval 
Yang Lingpeng, Ji Donghong, Tang Li 

August 2004 Proceedings of the 20th international conference on Computational 
Linguistics COLING '04 
Publisher: Association for Computational Linguistics 

Full text available: ^pdf(131.48 KB) Additional Information: full citation , abstract , references 



For Information Retrieval, users are more concerned about the precision of top ranking 
documents in most practical situations. In this paper, we propose a method to improve 
the precision of top N ranking documents by reordering the retrieved documents from the 
initial retrieval. To reorder documents, we first automatically extract Global Key Terms 
from document set, then use extracted Global Key Terms to identify Local Key Terms in a 
single document or query topic, fi ... 



2 ESSQL: an enhanced semi-structured query language for composite document 
^ retrievals 

Reo-Jo Yamashita, Tetsuro Ito, Hsiu-Hsen Yao 

September 1998 Proceedings of the 16th annual international conference on 
Computer documentation SIGDOC '98 

Publisher: ACM Press 

Full text available: pdf(632.66 KB1 Additional Information: full citation , references , index terms 







Formal models-2: Tuning before feedback: combining ranking discovery and blind 
feedback for robust retrieval 



Weiguo Fan, Ming Luo, Li Wang, Wensi Xi, Edward A. Fox 

July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '04 
Publisher: ACM Press 



Full text available: ^ pdf(306.72 KB1 Additional Information: full citation , abstract , references, index terms 



Both ranking functions and user queries are very important factors affecting a search 
engine's performance. Prior research has looked at how to improve ad-hoc retrieval 
performance for existing queries while tuning the ranking function, or modify and expand 
user queries using a fixed ranking scheme using blind feedback. However, almost no 
research has looked at how to combine ranking function tuning and blind feedback 
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together to improve ad-hoc retrieval performance. In this paper, we look at th ... 

Keywords; blind feedback, genetic programming, information retrieval, intelligent 
information retrieval, query expansion, ranking function, search engine 




Retrieval of complex objects using a four-valued logic 
Thomas Rolleke, Norbert Fuhr 

August 1996 Proceedings of the 19th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '96 
Publisher: ACM Press 

Full text available: Ddf(892.56 KB) Additional Information: full citation , references , citings , index terms 





Object-oriented and database concepts for the design of networked information 

retrieval systems 
Norbert Fuhr 

November 1996 Proceedings of the fifth international conference on Information and 
knowledge management CIKM '96 

Publisher: ACM Press 

Full text available: ^ pdf(1.04 MB) Additional Information: full citation , references, citings , index terms 



6 Information access and retrieval (lARl: Shadow document methods of resutls 



^ merging 

^ Shengli Wu, Fabio Crestani 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing SAC '04 



Publisher: ACM Press 

Full text available: ^ odffl 76.76 KB) Additional Information: full citation , abstract , references , index terms 



In distributed information retrieval systems, document overlaps occur frequently across 
results from different databases. This is especially the case for meta-search engines which 
merge results from several general-purpose web search engines. This paper addresses 
the problem of merging results which contain overlaps in order to achieve better 
performance. Several algorithms for merging results are proposed, which take advantage 
of the use of duplicate documents in two ways: one correlates scores ... 



Keywords: data fusion, information retrieval 




Metadata for integrating speech documents in a text retrieval system 9 

Ulrike Glavitsch, Peter Schauble, Martin Wechsler 
December 1994 ACM SIGMOD Record, Volume 23 issue 4 



Publisher: ACM Press 

Full text available: ^ pdf(603.39 KB) Additional Information: full citation , abstract , citings , index terms 

We present an information retrieval system that simultaneously allows to search for text 
and speech documents. The retrieval system accepts vague queries and performs a best- 
match search to find those documents that are relevant to the query. The output of the 
retrieval system is a list of ranked documents where the documents on the top of the list 
satisfy best the user's information need. The relevance of the documents is estimated by 
means of metadata (document description vectors). The metada ... 



Enhanced hypertext categorization using hyperlinks 
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A Soumen Chakrabarti, Byron Dom, Piotr Indyk 

^ June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 
conference on Management of data SIGMOD '98, volume 27 issue 2 

Publisher: ACM Press 

Full text available- Ddfd 91 MB) Additional Information: full citation , abstract , references, citings , index 

A major challenge in indexing unstructured hypertext databases is to automatically 
extract meta-data that enables structured search using topic taxonomies, circumvents 
keyword ambiguity, and improves the quality of search and profile-based routing and 
filtering. Therefore, an accurate classifier is an essential component of a hypertext 
database. Hyperlinks pose new problems not addressed in the extensive text classification 
literature. Links clearly contain high-quality semantic clues that ... 



9 E-rulemakinq: Near-duplicate detection for eRulemakinq 

Hui Yang, Jamie Callan 

May 2005 Proceedings of the 2005 national conference on Digital government 
research dg.o2005 

Publisher: Digital Government Research Center 

Full text available: « Ddff248.00 KB) Additional Information: full citation , abstract, references , citings, index 

terms 




U.S. regulatory agencies are required to solicit, consider, and respond to public comments 
before issuing regulations. In recent years, agencies have begun to accept comments via 
both email and Web forms. The transition from paper to electronic comments makes it 
much easier for individuals to customize "form" letters, which they do, creating "near- 
duplicate" comments that express the same viewpoint in slightly different languages. This 
paper explores the use of simple text clustering and retriev ... 



Keywords: eRulemaking, information retrieval, near duplicate detection, public 
comments 




Applying summarization techniques for term selection in relevance feedback 



Adenike M. Lam-Adesina, Gareth J. F. Jones 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval SIGIR '01 

Publisher: ACM Press 



Full text available: g Pdff253.28 KB) 



Additional Information: full citation , abstract , references , citings , index 
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Query-expansion is an effective Relevance Feedback technique for improving performance 
in Information Retrieval. In general query-expansion methods select terms from the 
complete contents of relevant documents. One problem with this approach is that 
expansion terms unrelated to document relevance can be introduced into the modified 
query due to their presence in the relevant documents and distribution in the document 
collection. Motivated by the hypothesis that query-expansion terms should ... 



11 




XML constraints and the semantic web: Information retrieval on the semantic web 



Urvi Shah, Tim Finin, Anupam Joshi 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management CIKM '02 

Publisher: ACM Press 



Full text available: ^ pdfd 92.40 KB) Additional Information: full citation , abstract , references , citings 



We describe an approach to retrieval of documents that contain of both free text and 
semantically enriched markup. In particular, we present the design and implementation 
prototype of a framework in which both documents and queries can be marked up with 
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statements in the DAML+OIL semantic web language. These statements provide both 
structured and semi-structured information about the documents and their content. We 
claim that indexing text and semantic markup together will significantly improve ... 

Keywords: hybrid information retrieval, query-answering systems, semantic web, text 
extraction 
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Information retrieval: Assessing the retrieval effectiveness of a speech retrieval 
system bv simulating recognition errors 



Peter Schauble, Ulrike Glavitsch 

March 1994 Proceedings of the workshop on Human Language Technology HLT '94 
Publisher: Association for Computational Linguistics 

Full text available; © Ddf(287.44 KB) Additional Information: full citation , abstract , references 



We show how the recognition performance of a speech recognition component in a speech 
retrieval system affects the retrieval effectiveness. A speech retrieval system facilitates 
content-based retrieval of speech documents, i.e. audio recordings containing spoken 
text. The speech retrieval process receives queries from users and for every query it 
ranks the speech documents in decreasing order of their probabilities that they are 
relevant to the query. The speech recognition component is an impor ... 




A system for retrieving speech documents 



Ulrike Glavitsch, Peter Schauble 

June 1992 Proceedings of the 15th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '92 
Publisher: ACM Press 



Full text available: ^].Bdf(94162J^ 



Additional Information: full citation , abstract , references , citings , index 
terms 



An information retrieval model is presented for the retrieval of speech documents, i.e. 
audio recordings containing speech. The indexing vocabulary consists of indexing features 
that have the following characteristics. First, they are easy to recognize by speech 
recognition methods. Second, the number of different indexing features is small such that 
a reasonable amount of training data is sufficent to train the hidden Markov models that 
are used by the speech recognition process. Third, th ... 




Toward an improved concept-based information retrieval system 



Peter V. Henstock, Daniel J. Pack, Young-Suk Lee, Clifford J. Weinstein 
September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval SIGIR '01 
Publisher: ACM Press 



Full text available: ^ pdff 150.27 KB1 Additional Information: full citation , abstract , references , index terms 



This paper presents a novel information retrieval system that includes 1) the addition of 
concepts to facilitate the identification of the correct word sense, 2) a natural language 
query interface, 3) the inclusion of weights and penalties for proper nouns that build upon 
the Okapi weighting scheme, and 4) a term clustering technique that exploits the spatial 
proximity of search terms in a document to further improve the performance. The 
effectiveness of the system is validated by experime ... 



Keywords; Roget's thesaurus, WordNet, brill tegger, concept, information retrieval, word 
sense disambiguation 
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Links for a better web: Refinement of TF-IDF schemes for web pages using their 
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hvperlinked neighboring pages 



Kazunari Sugiyama, Kenji Hatano, Masatoshi Yoshikawa, Shunsuke Uemura 
August 2003 Proceedings of the fourteenth ACM conference on Hypertext and 
hypermedia HYPERTEXT '03 
Publisher: ACM Press 



Full text available: ^ pdff21 1 .25 KB) 



Additional Information: full citation , abstract , references , citings , index 
terms 



In IR (information retrieval) systems based on the vector space model, the TF-IDF 
scheme is widely used to characterize documents. Hovyever, in the case of documents 
with hyperiink structures such as Web pages, it is necessary to develop a technique for 
representing the contents of Web pages more accurately by exploiting the contents of 
their hyperlinked neighboring pages. In this paper, we first propose several approaches to 
refining the TF-IDF scherne for a target Web page by using the contents ... 



Keywords: TF-IDF scheme, WWW, hyperlink, information retrieval 




Web search 2: Entropv-based link analysis for mining web informative structures ^ 

Hung-Yu Kao, Ming-Syan Chen, Shian-Hua Lin, Jan-Ming Ho 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management CIKM '02 

Publisher: ACM Press 



Full text available: g Pdff563.64 KB) 



Additional Information: full citation , abstract , references , citings , index 
terms 



In this paper, we study the problem of mining the informative structure of a news Web 
site which consists of thousands of hyperlinked documents. We define the informative 
structure of a news Web site as a set of index pages (or referred to as TOC, i .e., table of 
contents, pages) and a set of article pages linked by TOC pages through informative links. 
It is noted that the Hyperlink Induced Topics Search (HITS) algorithm has been employed 
to provide a solution to analyzing authorities and hubs of ... 

Keywords: anchor text, entropy, hubs and authorities, information extraction, 
informative structure, link analysis 



17 Biomedical text retrieval in languages with a complex morphology ^ 

Stefan Schulz, Martin Honeck, Udo Hahn 

July 2002 Proceedings of the ACL-02 workshop on Natural language processing in 
the biomedical domain - Volume 3 

Publisher: Association for Computational Linguistics 

Full text available: ^ Ddfn29.10 KB) Additional Information: full citation , abstract , references , citings 

Document retrieval in languages with a rich and complex morphology - particularly in 
terms of derivation and (single-word) composition - suffers from serious performance 
degradation with the stemming-only query-term -to-text-wo.rd matching paradigm. We 
propose an alternative approach in which morphologically complex word forms are 
segmented into relevant subwords (such as stems, named entities, acronyms), and 
subwords constitute the basic unit for indexing and retrieval. We evaluate our approach 
o ... 

18 A technique for automatically organizing software libraries for software reuse 9 

Khuzaima S. Daudjee, Anestis A. Toptsis 

October 1994 Proceedings of the 1994 conference of the Centre for Adva need Studies 
on Collaborative research CASCON '94 

Publisher: IBM Press 
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As software reuse becomes more prominent and accepted in industry, systems and tools 
for software reuse become a key aspect in achieving successful reuse of software 
artifacts. A major problem with such tools is the retrieval and classification of the software 
modules. To search for and retrieve the conceptually closest software component from a 
library of software modules, components need to be classified in some manner. We 
address this problem by showing how the software library can be organiz ... 

Keywords: clustering, software libraries, software reuse 




Web structure: Block-based web search 



Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma 

July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '04 
Publisher: ACM Press 



Full text available: ■g | pdff225.50 KBt 



Additional Information: full citation , abstract , references , citings , index 
terms 



Multiple-topic and varying-length of web pages are two negative factors significantly 
affecting the performance of web search. In this paper, we explore the use of page 
segmentation algorithms to partition web pages into blocks and investigate how to take 
advantage of block-level evidence to improve retrieval performance in the web context. 
Because of the special characteristics of web pages, different page segmentation method 
wiil have different impact on web search performance. We compare four ... 



Keywords; Vision-based page segmentation, page segmentation, passage retrieval, 
query expansion, web information retrieval 




Machine learning for IR: Learning effective ranking functions for newsgroup search ^ 

Wensi XI, Jesper Lind, Eric Brill 

July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '04 
Publisher: ACM Press 



Full text available: ^Ddff281. 11 KBf 



Additional Information: full citation , abstract , references , citings , index 
terms 



Web communities are web virtual broadcasting spaces where people can freely discuss 
anything. While such communities function as discussion boards, they have even greater 
value as large repositories of archived information. In order to unlock the value of this 
resource, we need an effective means for searching archived discussion threads. 
Unfortunately the techniques that have proven successful for searching document 
collections and the Web are not ideally suited to the task of searching archive ... 



Keywords: information retrieval, linear regression, machine learning, newsgroup search, 
support vector machines 
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Interactive document retrieval with relational learning 

Masayuki Okabe, Seiji Yamada 

March 2001 Proceedings of the 2001 ACM symposium on Applied computing SAC '01 

Publisher: ACM Press 

Full text available: ^Pdfd 62.72 KB^ Additional Information: full citation , references , index terms 




Keywords: information retrieval, relational learning, relevance feedback 



2 THESUS: Organizing Web document collections based on link semantics Q 

Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, Michalis Vazirgiannis 
November 2003 The VLDB Journal — The International Journal on Very Large Data 
Bases, volume 12 Issue 4 
Publisher: Springer-Verlag New York, Inc. 

Full text available; ^ pdf(262.85 KB) Additional Information: full citation , abstract , citings , index terms 

The requirements for effective search and management of the WWW are stronger than 
ever. Currently Web documents are classified based on their content not taking intp 
account the fact that these documents are connected to each other by links. We claim 
that a page's classification is enriched by the detection of its incoming links' semantics. 

This would enable effective browsing and enhance the validity of search results in the 
WWW context. Another aspect that is underaddressed and str ... 

Keywords: Document clustering. Link analysis. Link management. Semantics, Similarity 
measure. World Wide Web 
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Crawling: Accelerated focused crawling through online relevance feedback 

Soumen Chakrabarti, Kunal Punera, Mallela Subramanyam 

May 2002 Proceedings of the 11th international conference on World Wide Web 
WWW *02 

Publisher: ACM Press 



Full text available: ^ Pdf(589.82 KB) 



Additional Information; full citation , abstract , references , citings , index 
terms 



The organization of HTML into a tag tree structure, which is rendered by browsers as 
roughly rectangular regions with embedded text and HREF links, greatly helps surfers 
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locate and click on links that best satisfy their information need. Can an automatic 
program emulate this human behavior and thereby learn to predict the relevance of an 
unseen HREF target page w.r.t. an information need, based on information iimited to the 
HREF source page? Such a capability would be of great interest in focuse ... 

Keywords: document object modei, focused crawling, reinforcement iearning 




Set-based vector model: An efficient approach for correlation-based ranking 

Bruno Possas, Nivio Ziviani, Wagner Meira, Berthier Ribeiro-Neto 

October 2005 ACM Transactions on Information Systems (TOIS), volume 23 issue 4 



Publisher: ACM Press 

Full text available: pdftSOO.SQ KBI 



Additional Information: full citation , abstract , references , citings, index 
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This work presents a new approach for ranking documents in the vector space model. The 
novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and 
are processed efficiently. Second, term weights are generated using a data mining 
technique called association rules. This leads to a new ranking mechanism called the set- 
based vector model. The components of our model are no longer index terms but index 
termsets, where a termset is a set of index terms. Termset ... 



Keywords: Information retrieval models, association rule mining, correlation -based 
ranking, data mining, weighting index term co-occurrences 




Posters: Term proximity scoring for ad-hoc retrieval on very large text collections 

Stefan Buttcher, Charles L. A. Clarke, Brad Lushman 

August 2006 Proceedings of the 29th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '06 
Publisher: ACM Press 



Full text available: ^ pdf(102.16 KB) Additional Information: full citation , abstract , references , index terms 




We propose an integration of term proximity scoring into Okapi BM25. The relative 
retrieval effectiveness of our retrieval method, compared to pure BM25, varies from 
collection to collection. We present an experimental evaluation of our method and show 
that the gains achieved over BM25 as the size of the underlying text collection increases. 
We also show that for stemmed queries the impact of term proximity scoring is larger 
than for unsternmed queries. 



Keywords: information retrieval, query processing, term proximity 



6 Document detection: Inquerv system overview Q 

John Broglio, James P. Cal Ian, W. Bruce Croft 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 
September 19-23, 1993 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdffl.57 MB) Additional Information: full citation , abstract , references 

The TIPSTER project in the Information Retrieval Laboratory of the Computer Science 
Department, University of Massachusetts, Amherst (which includes MCC as a 
subcontractor), has focused on the following goals:* Improving the effectiveness of 
information retrieval techniques for large, full-text databases,* Improving the 
effectiveness of routing techniques appropriate for long-term information needs, and* 
Demonstrating the effectiveness of these retrieval and routing techniques for ... 
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CIKM workshop reports: GIR'05 2005 ACM workshop on geographical information Q 
retrieval 

Christopher B. Jones, Ross Purves 

June 2006 ACM SIGIR Forum, Volume 40 Issue 1 

Publisher: ACM Press 

Full text available: ^ pdfn 62.90 KB) Additional Information: full citation , abstract , index terms 

Geographical information retrieval (GIR) is concerned with the problems of finding 
information resources that relate to particular geographical locations. Until recently most 
web search engines have treated geographical terminology within user queries in the 
same way as other terminology. This can often result in failure to find relevant documents 
and in the retrieval of irrelevant documents. There are several reasons for this. For 
example, there are many different places with the same name, so ... 




Search engineering 2: Texauery: a full-text search extension to xquerv 

S. Amer-Yahia, C. Botev, J. Shanmugasundaram 

May 2004 Proceedings of the 13th International conference on World Wide Web 
WWW '04 

Publisher: ACM Press 



Full text available: ^pdf(117.50 KBl 
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One of the key benefits of XML is its ability to represent a mix of structured and 
unstructured (text) data. Although current XML query languages such as XPath and 
XQuery can express rich queries over structured data, they can only express very 
rudimentary queries over text data. We thus propose TeXQuery, which is a powerful full- 
text search extension to XQuery. TeXQuery provides a rich set of fully composable full- 
text search primitives,such as Boolean Connectives, phrase matching, proximity di ... 



Keywords; full-text search, xquery 
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Ramakrishna Varadarajan, Vagelis Hristidis 

October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 
Publisher: ACM Press 



Full text available: 
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Summarization of text documents is increasingly important with the amount of data 
available on the Internet. The large majority of current approaches view documents as 
linear sequences of words and create query-independent summaries. However, ignoring 
the structure of the document degrades the quality of summaries. Furthermore, the 
popularity of web search engines requires query-specific summaries. We present a 
method to create query-specific summaries by adding structure to documents by 
extract ... 



Keywords: Steiner tree problem, adding structure to documents, keyword search, query- 
specific summarization, user survey 
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Research track: SEWeP: using site semantics and a taxonomy to enhance the Web 

personalization process 

M. Eirinaki, M. Vazirgiannis, I. Varlamis 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '03 
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Publisher: ACM Press 
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Web personalization is the process of customizing a Web site to the needs of each specific 
user or set of users, taking advantage of the knowledge acquired through the analysis of 
the user's navigational behavior. Integrating usage data with content, structure or user 
profile data enhances the results of the personalization process. In this paper, we present 
SEWeP, a system that makes use of both the usage logs and the semantics of a Web 
site’s content in order to personalize it. Web content is ... 



Keywords: Web mining, Web personalization, concept hierarchies, semantic annotation 
of Web content 



'•'I Automatic evaluation of students' answers using syntactically enhanced ISA Q 

Dharmendra Kanejiya, Arun Kumar, Surendra Prasad 

May 2003 Proceedings of the HLT-NAACL 03 workshop on Building educational 
applications using natural language processing - Volume 2 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(149.34 KB) Additional Information: full citation , abstract , references 

Latent semantic analysis (LSA) has been used in several intelligent tutoring systems 
(US's) for assessing students' learning by evaluating their answers to questions in the 
tutoring domain. It is based on word-document co-occurrence statistics in the training 
corpus and a dimensionality reduction technique. However, it doesn't consider the word- 
order or syntactic information, which can improve the knowledge representation and 
therefore lead to better performance of an US. We present here an app ... 




XML and text: XRANK: ranked keyword search over XML documents 
Lin Guo, Feng Shao, Chavdar Botev, Jayavel Shanmugasundaram 
June 2003 Proceedings of the 2003 ACM SIGMOD i nternational conference on 
Management of data SIGMOD '03 
Publisher: ACM Press 



Full text available: " Q pdf(265.38 KB) 



Additional Information: full citation , abstract , references , citings , index 
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We consider the problem of efficiently producing ranked results for keyword search 
queries over hyperlinked XML documents. Evaluating keyword search queries over 
hierarchical XML documents, as opposed to (conceptually) flat HTML documents, 
introduces many new challenges. First, XML keyword search queries do not always return 
entire documents, but can return deeply nested XML elements that contain the desired 
keywords. Second, the nested structure of XML implies that the notion of ranking is no 
I ... 




Beyond document similarity: understanding value-based search and browsing 
technologies 

Andreas Paepcke, Hector Garcia-Molina, Gerard Rodriguez-Mula, Junghoo Cho 
March 2000 ACM SIGMOD Record, Volume 29 Issue 1 



Publisher: ACM Press 

Full text available: ^pdf(1.29MBf Additional Information: full citation , abstract , citings , index terms 

In the face of small, one or two word queries, high volumes of diverse documents on the 
Web are overwhelming search and ranking technologies that are based on document 
similarity measures. The increase of multimedia data within documents sharply 
exacerbates the shortcomings of these approaches. Recently, research prototypes and 
commercial experiments have added techniques that augment similarity-based search and 
ranking. These techniques reiy on judgments about the 'value' of documents. Jud ... 
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Keywords: World-Wide Web, collaborative filtering, hypertext, information filters, 
information retrievai, links, metadata, ranking, relevance, search engines 
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August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR *98 
Publisher: ACM Press 

Full text available: pdf(237.08 KB) Additional Information: full citation , references , citings , index terms 





Probabilistic and genetic algorithms in document retrieval Q 

M. Gordon 

October 1988 Communications of the ACM, Volume 3i issue lo 
Publisher: ACM Press 



Full text available: g Pdfn. 27 MB1 
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Document retrieval systems are built to provide inquirers with cornputerized access to 
relevant documents. Such systems often miss many relevant documents while falsely 
identifying many non-relevant documents. Here, competing document descriptions are 
associated with a document and altered over time by a genetic algorithm according to the 
queries used and relevance judgments made during retrieval. 



IS Efficiency: Set-based model: a new approach for information retrieval 

^ Bruno Possas, Nivio Ziviani, Wagner Meira, Berthier Ribeiro-Neto 

^ August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '02 

Publisher: ACM Press 



Full text available: " g] Ddfd 23.55 KB1 



Additional Information: full citation , abstract , references , citings , index 
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The objective of this paper is to present a new technique for computing term weights for 
index terms, which leads to a new ranking mechanism, referred to, as set-based model. 
The components in our model are no longer terms, but termsets. The novelty is that we 
compute term weights using a data mining technique called association rules, which is 
time efficient and yet yields nice improvements in retrieval effectiveness. The set-based 
model function for computing the similarity between a document a ... 

Keywords: closed association rule mining, data mining, information retrieval models, set 
theory, weighting index term co-occurrences 



Fast detection of communication patterns in distributed executions ill 

Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Adva need 
Studies on Collaborative research CASCON *97 
Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation, abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
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and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 

'18 Summarization: A system for query-specific document summarization Q 

Ramakrishna Varadarajan, Vagelis Hristidis 

^ November 2006 Proceedings of the 15th ACM international conference on Information 
and knowledge management CIKM '06 
Publisher: ACM Press 

Full text available: ^ Ddft709.38 KB) Additional Information: full citation , abstract , references, index terms 

There has been a great amount of work on query-independent summarization of 
documents. However, due to the success of Web search engines query-specific document 
summarization (query result snippets) has become an important problem, which has 
received little attention. We present a method to create query-specific summaries by 
identifying the most query-relevant fragments and combining them using the semantic 
associations within the document. In particular, we first add structure to the doc ... 

Keywords: Steiner tree problem, keyword search, query-specific summarization, user 
survey 



19 Document detection: TIPSTER phase I final report Q 

Bill Caid, Stephen Gallant, Joel Carleton, David Sudbeck 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 
September 19-23, 1993 
Publisher: Association for Computational Linguistics 
Full text available: ^ pdffi .84 MB1 Additional Information: full citation , abstract 

During Phase I of the TIPSTER program, HNC developed a unique approach to machine 
learning of similarity of meaning. This approach, embodied in a system called 
"MatchPlus", exploits this learned similarity of meaning for concept-based text retrieval, 
routing and visualization of textual information. MatchPlus uses an information 
representation scheme called "context vectors" to encode similarity of usage. Key 
attributes of the context vector approach are as follows:* Words, documents, and q ... 




IR evaluation methods for retrieving highly relevant documents 

Kalervo Jarvelin, Jaana Kekalainen 

July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '00 
Publisher: ACM Press 



Full text available: g Pdff769.94 KB1 
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This paper proposes evaluation methods based on the use of non-dichotomous relevance 
judgements in IR experiments. It is argued that evaluation methods should credit IR 
methods for their ability to retrieve highly relevant documents. This is desirable from the 
user point of view in modern large IR environments. The proposed methods are (1) a 
novel application of P-R curves and average precision computations based on separate 
recall bases for documents of different degrees of relevance, and (2 ... 
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Mashhuda Glencross, Alan G. Chalmers, Ming C. Lin, Miguel A. Otaduy, Diego Gutierrez 
July 2006 ACM SIGGRAPH 2006 Courses SIGGRAPH '06 



Publisher: ACM Press 

Full text available. ^ pdf(5.07 MB) Q Additional Information: full citation, abstract, references 
mov(68:6 MIN) 



The objective of this course is to provide an introduction to the issues that must be 
considered when building high-fidelity 3D engaging shared virtual environments. The 
principles of human perception guide important development of algorithms and 
techniques in collaboration, graphical, auditory, and haptic rendering. We aim to show 
how human perception is exploited to achieve realism in high fidelity environments within 
the constraints of available finite computational resources. In this course w ... 



Keywords; collaborative environments, haptics, high-fidelity rendering, human-computer 
interaction, multi-user, networked applications, perception, virtual reality 
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September 19-23, 1993 
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Full text available: ^pdf(1.84IVIB) Additional Information: full citation , abstract 

During Phase I of the TIPSTER program, HNC developed a unique approach to machine 
learning of similarity of meaning. This approach, embodied in a system called 
"MatchPlus", exploits this learned similarity of meaning for concept-based text retrieval, 
routing and visualization of textual information. MatchPlus uses an information 
representation scheme called "context vectors" to encode similarity of usage. Key 
attributes of the context vector approach are as follows:* Words, documents, and q ... 
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Additional Information: full citation , abstract , references , citings , index 
terms 



Relevance feedback, which modifies queries using judgements of the relevance of a few, 
highly-ranked documents, has historically been an important method for increasing the 
performance of information retrieval systems. In this paper, we extend the inference 
network model introduced by Turtle and Croft to include relevance feedback techniques. 
The difference between relevance feedback on text abstracts and full text collections is 
studied. Preliminary results for relevance feedback on the st ... 




Fast evaluation of structured queries for information retrieval 



Eric W. Brown 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '95 
Publisher: ACM Press 



Full text available: ^ Ddfd. 15 MB) 
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Information retrieval systems are being challenged to manage larger and larger document 
•collections. In an effort to provide better retrieval performance on large collections, more 
sophisticated retrieval techniques have been developed that support rich, structured 
queries. Structured queries are not amenable to previously proposed optimization 
techniques. Optimizing execution, however, is even more important in the context of large 
document collections. We present a new structured qu ... 
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Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Adva need 
Studies on Collaborative research CASCON *97 
Publisher: IBM Press 

Full text available: ^ Pdf(4.21 MB) Additional Information: full citation , abstract , references, index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 
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This work presents a new approach for ranking documents in the vector space model. The 
novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and 
are processed efficiently. Second, term weights are generated using a data mining 
technique called association rules. This leads to a new ranking mechanism called the set- 
based vector model. The components of our model are no longer index terms but index 
termsets, where a termset is a set of index terms. Termset ... 



Keywords; Information retrieval models, association rule mining, correlation-based 
ranking, data mining, weighting index term co-occurrences 
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October 2001 Proceedings of the tenth international conference on Information and 
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Publisher: ACM Press 
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The richness of the XML data format allows data to be structured in a way which precisely 
captures the semantics required by the author. It is the structure of the data, however, 
which forms the basis of all XML query languages. Without at least some notion of the 
structure, a user cannot meaningfully query the data. This problem Is compounded when 
one considers that heterogeneous data adhering to different schema are likely to exist in 
the database(s) being queried. This paper proposes a soluti ... 
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A unifying semantic distance model for determining the similarity of attribute values 2 

John F. Roddick, Kathleen Hornsby, Denise de Vries 

February 2003 Proceedings of the 26th Australasian computer science conference - 
Volume 16 ACSC '03 
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Full text available: ^ pdff222.01 KB1 



The relative difference between two data values is of interest in a number of application 
domains including temporal and spatial applications, schema versioning, data 
warehousing (particularly data preparation), internet searching, validation and error 
correction, and data mining. Moreover, consistency across systems in determining such 
distances and the robustness of such calculations is essential in some domains and useful 
in many. Despite this, there is no generally adopted approach to determ ... 
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Research on multimedia information retrieval (MIR) has recently witnessed a booming 
interest. A prominent feature of this research trend is its simultaneous but independent 
materialization within several fields of computer science. The resulting richness of 
paradigms, methods and systems may, on the long run, result in a fragmentation of 
efforts and slow down progress. The primary goal of this study is to promote an 
integration of methods and techniques for MIR by contributing a conceptual model ... 



Keywords: Description logics, fuzzy logics, multimedia information retrieval 
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This updated course on simulating natural phenomena will cover the latest research and 
production techniques for simulating most of the elements of nature. The presenters will 
provide movie production, interactive simulation, and research perspectives on the 
difficult task of photorealistic modeling, rendering, and animation of natural phenomena. 

The course offers a nice balance of the latest interactive graphics hardware-based 
simulation techniques and the latest physics-based simulation techni ... 

11 The QODB path-method generator (PMG) using access weights and precomputed ||| 
access relevance 

Ashish Mehta, James Geller, Yehoshua Perl, Erich Neuhold 

February 1998 The VLDB Journal — The International Journal on Very Large Data 
Bases, Volume 7 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: pdf(265.48 KBI Additional Information; full citation , abstract , citings , index terms 

A path-method is used as a mechanism in object-oriented databases (OODBs) to retrieve 
or to update information relevant to one class that Is not stored with that class but with 
some other class. A path -method is a method which traverses from one class through a 
chain of connections between classes and accesses information at another class. However, 
it is a difficult task for a casual user or even an application programmer to write path- 
methods to facilitate queries. This is because it mig ... 

Keywords: Access relevance,, Access weight, OODB queries. Object-oriented databases. 
Path-method, Traversal algorithms 
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Methods for information server selection 

David Hawking, Paul Thistlewaite 

January 1999 ACM Transactions on Information Systems (TOIS), Volume 17 issue i 
Publisher: ACM Press 
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Full text available:' 



The problem of using a broker to select a subset of available information servers in order 
to achieve a good trade-off between document retrieval effectiveness and cost is 
addressed. Server selection methods which are capable of operating in the absence of 
global information, and where servers have no knowledge of brokers, are Investigated. A 
novel method using Lightweight Probe queries (LWP method) is compared with several 
methods based on data from past query processing, while Random and ... 



Keywords; Lightweight Probe queries, information servers, network servers, server 
ranking, server selection, text retrieval 
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This paper develops a general, formal framework for modeling term dependencies via 
Markov random fields. The model allows for arbitrary text features to be incorporated as 
evidence. In particular, we make use of features based on occurrences of single terms, 
ordered phrases, and unordered phrases. We explore full Independence, sequential 
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The paper presents the Position Specific Posterior Lattice, a novei representation of 
automatic speech recognition iattices that naturaiiy iends itseif to efficient indexing of 
position information and subsequent reievance ranking of spoken documents using 
proximity. In experiments performed on a coiiection of lecture recordings — MIT iCampus 
data — the spoken document ranking accuracy was improved by 20% relative over the 
commonly used baseline of indexing the 1-best output from an automatic ... 
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John Broglio, James P. Cal Ian, W. Bruce Croft 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 
September 19-23, 1993 
Publisher; Association for Computational Linguistics 



Full text available: ^ pdfM.57 MB1 Additional Information: full citation , abstract , references 



The TIPSTER project in the Information Retrieval Laboratory of the Computer Science 
Department, University of Massachusetts, Amherst (which includes MCC as a 
subcontractor), has focused on the following goals:* Improving the effectiveness of 
information retrieval techniques for large, full-text databases,* Improving the 
effectiveness of routing techniques appropriate for long-term information needs, and* 
Demonstrating the effectiveness of these retrieval and routing techniques for ... 




Early user— system interaction for database selection in massive domain-specific 

online environments 

Jack G. Conrad, Joanne R. S. Claussen 

January 2003 ACM Transactions on Information Systems (TOIS) , Volume 21 Issue 1 



Publisher: ACM Press 

Full text available: ^ pdf(845.54 KB) Additional Information: full citation , abstract , references , index terms 



The continued growth of very large data environments such as Westlaw and Dialog, in 
addition to the World Wide Web, increases the importance of effective and efficient 
database selection and searching. Current research focuses largely on completely 
autonomous and automatic selection, searching, and results merging in distributed 
environments. This fully automatic approach has significant deficiencies, including reliance 
upon thresholds below which databases with relevant documents are not search ... 



Keywords: Database selection, metadata for retrieval, structuring information to aid 
search and navigation, user interaction 
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We introduce a method for learning to find documents on the Web that contain answers to 
a given natural language question. In our approach, questions are transformed into new 
queries aimed at maximizing the probability of retrieving answers from existing 
information retrieval systems. The method involves automatically learning phrase features 
for classifying questions into different types, automatically generating candidate query 
transformations from a training set of question/answer pairs, and ... 



Keywords; Web search, information retrieval, meta-search, query expansion, question 
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dependence, and full dependence variants of the model. A novel approach is developed to 
train the model that directly maximizes the mean average precision rathe ... 

Keywords; Markov random fields, information retrieval, phrases, term dependence 
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Chao Liu, Jiawei Han 

November 2006 Proceedings of the 14th ACM SIGSOFT international symposium on 
Foundations of software engineering SIGSOFT '06/FSE-14 
Publisher: ACM Press 

Full text available: *g | pdf(728.57 KB) Additional Information; full citation , abstract , references , index terms 



Recent software systems usually feature an automated failure reporting system, with 
which a huge number of failing traces are collected every day. In order to prioritize fault 
diagnosis, failing traces due to the same fault are expected to be grouped together. 
Previous methods, by hypothesizing that similar failing traces imply the same fault, 
cluster failing traces based on the literal trace similarity, which we call trace proximity. 
However, since a fault can be triggered in many ways, ... 




Keywords: debugging aids, failure proximity, statistical debugging 
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Donna Harman 
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Publisher: Association for Computational Linguistics 

Full text available: ^ odfO.ZO MB) Additional Information: full citation , abstract , references 

There have been four Text REtrieval Conferences (TRECs); TREC-1 in November 1992, 
TREC-2 in August 1993, TREC-3 in November 1994 and TREC-4 in November 1995. The 
number of participating systems has grown from 25 in TREC-1 to 36 in TREC-4, including 
most of the major text retrievai software companies and most of the universities doing 
research in text retrieval (see table for some of the participants). The diversity of the 
participating groups has ensured that TREC represents many different appro ... 
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January 1977 Proceedings of the 5th annual ACM computer science conference CSC 
'77 

Publisher: ACM Press 

Full text available: ^ pdff3.14 MB1 Additional Information; full citation , abstract , index terms 



One problem in computer program testing arises when errors are found and corrected 
after a portion of the tests have run properly. How can It be shown that a fix to one area 
of the code does not adversely affect the execution of another area? What is needed is a 
quantitative method for assuring that new program modifications do not introduce new 
errors into the code. This model considers the retest philosophy that every program 
instruction that could possibly be reached and tested from the ... 
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